Contract  Number  N00039-81-0663  (MIT  #  91445) 
Internal  Report  Number  M010-8205-10 
Deliverable  Number  "G 


VIRTUAL  INFORMATION  FACILITY 
OF  THE  INFOPLEX  SOFTWARE  TEST  VEHICLE 
(PART  I) 


Technical  Report  #10 


By  y 

¥ 

✓ 

Jameson  Lee 
May,  1982 


Principal  Investigator: 
Professor  Stuart  E.  Madnick 


Prepared  for: 

Naval  Electronics  Systems  Command 
Washington,  D.C. 


SECURITY  CLASSIFICATION  OF  this  pace  rWh»n  Data  Enltttd) 


REPORT  DOCUMENTATION  PAGE 


REPORT  NUMBER 


Technical  Report  #10 


4.  TITLE  (and  Submit) 


Virtual  Information  Facility  of  the 
INFOPLEX  Software  Test  Vehicle 


7.  AUThORCJ 


Jameson  Lee 


9.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 


Sloan  School  of  Management,  MIT 
50  Memorial  Drive,  Cambridge,  MA  02139 


II.  CONTROLLING  OFFICE  NAME  AND  AODRESS 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


3.  RECIPIENT'S  CATALOG  NUMBER 


s.  type  of  report  a  period  covered 


4.  PERFORMING  ORG.  REPORT  NUMBER 

M010-8205-10 


9.  CONTRACT  OR  GRANT  NUMBERO) 

N0039-81-C-0663 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  A  WORK  UNIT  NUMBERS 


12.  REPORT  DATE 


May  1982 


IS.  NUMBER  OF  PAGES 
180 


4.  MONITORING  AGENCY  NAME  &  ADDRESS)'//  dltioront  trom  Controlling  Office )  15.  SECURITY  CLASS,  (o I  thla  report) 

unclassified 

15a.  OECLASSI  F| CATION/  DOWNGRADING 
SCHEDULE 


IS.  DISTRIBUTION  STATEMENT  (ot  thlo  Roport) 

Approved  for  public  release;  distribution  unlimited 


17.  DISTRIBUTION  STATEMENT  (ot  the  abotrmet  entered  In  Block  30,  It  different  from  Roport) 


t).  KEY  WOROS  (Continue  on  rovoroo  oido  It  nocoooory  and  idontlty  by  block  numbor) 


database  computer,  database  management  system,  Software 
Test  Vehicle,  hierarchical  system,  virtual  information 


20.  ABSTRACT  (Continue  on  rovoroo  oldo  It  nocoooory  and  Idontlty  by  block  numbor) 

This  report  describes  the  software  designand  implementation  of  the  front- 
end  for  the  Virtual  information  facility  of  the  INFOPLEX  database  computer. 

It  is  part  of  a  major  effort  to  develop  a  software  simulation,  called 
Software  Test  Vehicle,  for  the  underlying  architecture  of  INFOPLEX. 

The  virtual  information  facility  is  a  single  level  of  operations  situated 
within  the  Functional  Hierarchy.  It  supports  the  use  of  virtual  information, 
a  virtual  entity  based  on  procedural  relationships  and  derivations  from  \ 


oo 


EDITION  OF  I  NOV  45  IS  OBSOLETE 
S/S  0102-014*4401 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  r»tl»n  Dtlt  Enttrtu) 


S-  ■ ,  i 


lj m T v  CLASSIFICATION  of  THIS  PA CEfHh»n  Datm  Enffd) 


physically  recorded  data.  Upon  completion,  this  facility  will  be 
Integrated  within  the  current  implementation  of  the  STV  for  the  INFOPIEX 
Functional  Hierarchy  which  lacks  the  support  for  virtual  information 
processing. 

\i 

/  ^ 


SECURITY  CLASSIFICATION  OF  THIS  PAGCrtFhw  Data  Entarad) 


Virtual  Information  Facility 
of  the  INFOPLEX  Software  Test  Vehicle 


by 

JAMESON  LEE 

Submitted  to  the  Department  of  Electrical 
Engineering  and  Computer  Science  in  May, 

1982,  in  partial  fulfillment  of  the 
requirements  for  the  degree  of 
Bachelor  of  Science 

Abstract 

This  thesis  is  a  software  design  and  implementation  of  the 
front-end  for  the  Virtual  Information  Facility  of  the  INFOPLEX 
data  base  computer.  It  is  part  of  a  major  effort  to  develop  a 
software  simulation,  so  called  a  Software  Test  Vehicle,  STV  , 
for  the  underlying  architecture  of  INFOPLEX. 

INFOPLEX  is  a  hierarchical  architecture  for  data  base  com¬ 
puters,  based  on  functional  decomposition  of  data  base  oper¬ 
ations.  It  is  a  current  research  project  of  the  Information 
Systems  Group  at  M.I.T.'s  Sloan  School  of  Management.  Within 
the  INFOPLEX  architecture,  a  functional  hierarchy  of  informa¬ 
tion  management  functions  is  built  on  top  of  a  storage 
hierarchy  of  information  storage  functions.  These  two  inde¬ 
pendent  hierarchies  are  further  divided  into  many  sub-levels, 
each  of  which  is  devoted  to  a  more  specific  function  of  data 


base  activities. 


The  virtual  information  facility  is  a  single  level  of  oper¬ 
ations  situated  within  the  functional  hierarchy.  It  supports 
the  use  of  virtual  information,  a  virtual  entity  based  on  pro¬ 
cedural  relationships  and  derivations  from  physically  recorded 
data.  Upon  completion,  this  facility  will  be  integrated  within 
the  current  implementation  of  the  the  STV  for  the  INFOPLEX 
functional  hierarchy  which  lacks  the  support  for  virtual 
information  processing. 
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1.0.0  INTRODUCTION 


INFOPLEX  DATA  BASE  COMPUTER  is  a  current  research  project  of 
the  Information  Systems  Group  at  M.I.T.'s  Sloan  School  of  Man¬ 
agement.  It  proposes  a  new  architecture  whose  objectives  are 
to  provide  substantial  improvements  in  information  management 
performance  over  conventional  computer  architectures,  and  to 
provide  highly  reliable  support  for  very  large  and  complex  data 
bases. 

1.1.0  INFOPLEX  OVERVIEW 

Progress  of  modern  society  has  put  increasingly  more  new  and 
challenging  demands  upon  the  capability  and  performance  of 
information  storage,  retrieval,  and  management.  Conventional 
computers,  whose  architecture  is  designed  primarily  for  compu¬ 
tational  objectives,  are  not  suited  to  meet  the  requirements  of 
these  new  demands.  Efforts  have  been  made  in  four  different 
areas  to  build  computer  systems  which  will  suit  our  information 
needs  today,  and  in  the  future.-  (1)  new  instructions  through 
microprogramming,  (2)  intelligent  controllers,  (3)  dedicated 
computers  for  data  base  operations,  and  (4)  data  base 
computers.  INFOPLEX  is  a  research  project  belonging  to  the 
fourth  category. 
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1.1.1  CONCEPT 


INFOPLEX  employs  the  concept  of  hierarchical  decomposition 
which  organizes  information  management  functions  into  a  func¬ 
tional  hierarchy,  and  the  physical  memory  management  functions 
into  a  storage  hierarchy  {Maanick  78);  both  hierarchies  con¬ 
sist  of  many  independent  levels  of  operation,  each  of  which 
supports  a  different  set  of  information  or  storage  management 
functions  through  the  use  of  multiple  microprocessors. 

1.1.2  INFOPLEX  ARCHITECTURE 

As  stated  previously,  INFOPLEX  is  an  architecture  for  data 
base  computers  based  on  hierarchical  decomposition.  A  func¬ 
tional  hierarchy  of  information  management  functions  is  built 
on  top  of  a  hierarchy  of  information  storage  functions.  Both 
hierarchies  are  further  divided  into  many  functionally  inde¬ 
pendent  levels  of  operation,  each  of  which  is  to  be  supported 
by  a  set  of  micro-processors  operating  in  parallel  with  one 
another.  A  global  Communication  Bus  coordinates  inter-level 
transmission  of  data.  This  hierarchical  architecture  exploits 
the  advantages  of  functional  modularity  of  operations,  and  of 
parallel  processing  of  micro-processors  to  systemize  data  base 
activities  and  to  achieve  a  prescribed  level  of  efficiency.  A 
graphical  illustration  of  this  architecture  is  presented  in 
figure  1.1. 
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1.1.3  FUNCTIONAL  HIERARCHY 


Current  architecture  of  the  functional  hierarchy  (Hsu  1932) 
with  respect  to  data  abstraction  consists  of  four  separate  lev¬ 
els:  (1)  external  level,  (2)  conceptual  level,  (3)  entity 
level,  and  (4)  internal  level.  A  part  of  the  conceptual  level 
is  a  virtual  information  facility  (Hsu  1982).  Thess  four  levels 
of  information  management  are  highly  independent  of  one  anoth¬ 
er,  and  each  is  responsible  for  a  different  but  necessary  phase 
of  information  processing  in  a  data  base  computer. 


1.1.4  RESEARCH  ISSUES 


Major  efforts  of  INFOPLEX  research  are  devoted  to  the  design, 
modeling,  and  evaluation  of  an  optimal  decomposition  strategy 
for  both  the  functional  and  memory  hierarchy  of  information 
management  and  storage  operation,  and  also  to  the  study  of  an 
associated  distributed  control  mechanism.  This  control  mech¬ 
anism  would  be  used  to  coordinate  the  activities  of  and 
inter-level  communications  within  the  hierarchies. 

1.2.0  THESIS  OBJECTIVE 

This  thesis  shares  a  joint  mission  with  a  concurrent  thesis 
by  Peter  Lu.  The  two  theses  are  entirely  separate  in 
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functionalities,  but  closely  related  and  dependent  upon  one 
another  for  a  complete  software  simulation  of  the  virtual 


information  facility  on  the  INFOPLEX  data  base  computer  archi¬ 
tecture.  This  facility  would  incorporate  the  design  and  imple¬ 
mentation  of  two  sub-levels  of  the  INFOPLEX  functional  hierar¬ 
chy,  the  virtual  information  level,  and  an  user  interface  level 
which  is  tailored  for  the  use  of  virtual  information 
processing. 

This  thesis  is  responsible  for  fullfillment  of  the  front-end 
objectives  of  the  joint  mission;  the  front-end  objectives 
include  the  design  and  implementation  of  the  following: 

a)  A  data  base  language  to  support  virtual  information 

b)  A  finite  state  machine  to  parse  data  base  statements 
written  in  this  language 

c)  A  user- interface  tailored  to  the  use  of  virtual 
information. 

d)  A  processor  to  process  the  creation,  listing,  and 
modifications  of  virtual  definitions,  as  well 

as  the  substitution  of  these  definitions  into 
data  base  statements  in  actual  use. 
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This  processor  would  also  be  responsible  for 
transforming  data  base  statements  into  a  chain  of 
tokens,  each  of  which  would  include  an  indicator 
describing  the  classification  of  the  token 
according  to  a  prescribed  classification  scheme. 

The -combined  objectives  of  this  "front-end"  and  Peter  Lu's 
"back-end"  would  fullfill  our  joint  mission  as  mentioned  ear¬ 
lier,  namely,  to  construct  in  software  a  virtual  information 
facility  with  its  own  user  interface,  from  here  on  referred  to 
as  VIFI,  Virtual  Information  Interpreter. 

1.2.1  BACKGROUND 

In  the  three  short  months  in  which  VIFI  was  develped,  we 
labored  and  wished  to  exhibit  a  certain  degree  of 
professionalism  in  its  design  and  implementation.  The  merits 
of  modular  programming,  of  innovative  algorithms,  of  perform¬ 
ance  efficiency,  of  functional  capabilities,  of 
user-friendliness  of  the  proposed  data  based  language,  of  pro¬ 
gram  organization  and  flexibility,  and  even  of  consistencies 
in  programming  style  were  evaluated  against  time  and  labor  lim¬ 
itations.  A  serious  attempt  was  made  to  incorporate  all  of 
these  characteristics  into  our  Virtual  Information 
Interpreter . 
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While  making  these  considerations,  many  sleepless  nights  of 
unceasing  arguments  plagued  the  two  developers;  it  was  the 
intrinsic  dissention  between  the  idealist  and  the  pragmatist. 
At  a  certain  point,  such  disagreements  grew  to  be  so  severe 
that  it  appeared  to  have  left  an  unpleasant  mark  on  a  very  close 
and  strongly  bonded  friendship.  However,  a  lesson  of  humanity 
was  learned  from  this  experience,  and  our  cherished  friendship 
would  continue  to  grow,  and  become  stronger  than  never  before, 
because  we  have  acknowledged  a  feeling  of  faith  and  destiny 
which  was  manifested  through  this  experience.  I  am  expressing 
this  sentiment  here  because  I  consider  it  the  most  personally 
meaningful  and  lasting  reward  of  this  thesis. 
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2.0.0  VIRTUAL  INFORMATION 

2.1.0  Concept 

The  concept  of  virtual  information  in  data  base  systems  has 
been  developed  and  examined  in  earlier  research  of  the  Informa¬ 
tion  Systems  Group.  Basically,  there  is  a  spectrum  of  the  kinds 
of  information  which  may  be  retrieved  from  a  data  base.  Along 
this  spectrum,  pure  data  occupy  an  extreme  on  one  end,  and  pure 
algorithms  occupy  the  extreme  on  the  other.  In  between  these 
two  extremes  are  the  information  which  may  be  derived  from  a 
combination  of  data  and  algorithms;  such  information  are 
dynamic  and  procedural  in  nature,  and  are  referred  to  as  Virtu¬ 
al  Information. 

2.2.0  CLASSIFICATION 

Virtual  information  may  be  categorized  into  three  major 
classes:  factored  facts,  inferred  facts,  and  computed  facts. 
Together,  these  three  classes  of  virtual  information  and  com¬ 
binations  there  of,  constitute  the  portion  of  the  information 
spectrum  between  the  two  extremes  of  pure  data  and  pure  algo¬ 
rithms  . 


2.2.1  FACTORED  FACTS 
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Factored  facts,  subsets  of  data  elements,  based  on  certain 
prescribed  conditions,  or  so  called  predicates,  of  attribute 
values,  are  often  very  valuable  in  structuring  information  in  a 
useful  manner.  For  instance,  if  a  certain  data  base  maintains 
records  of  weight,  hair  color,  and  salary  for  a  group  of 
employees,  it  may  be  useful  to  select  from  this  group  those 
individuals  who  share  a  certain  condition  on  their  attribute 
values,  such  as  having  black  hair,  making  a  salary  greater  than 
8  dollars  per  hour,  or  weighing  over  300  pounds.  It  is  impor¬ 
tant  that  users  of  information  should  be  able  to  access 
information  independent  of  the  particular  factoring  involved; 
this  would  imply  the  ability  to  support  multi-level  factoring, 
or  repeated  factoring  of  data. 

2.2.2  COMPUTED  FACTS 

Computed  facts  are  those  information  which  are  obtainable 
through  the  application  of  particular  computational  algorithms 
and  operators  on  data  or  groups  of  data.  These  operators 
include  arithmatic,  comparative,  boolean,  and  other  kinds  of 
functions.  In  the  very  least,  computed  facts  include  those  pure 
data  manifested  in  a  different  form,  with  a  different  unit  of 
measure,  or  an  alias  name.  For  instance:  a  user  may  define  a 
virtual  age  attribute  to  be  the  difference  between  the  current 
year  and  a  person's  birth-year,  a  virtual  rectangular  area 
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attribute  to  be  the  length  multiplied  by  the  width,  or  an 
attribute  value  in  the  unit  of  inches  to  be  12  times  the  attri¬ 
bute  value  in  the  unit  of  feet.  In  this  sense,  transformations 
between  different  units  of  measure  are  intrinsic  to  the  oper¬ 
ations  of  computed  facts. 

2.2.3  INFERRED  FACTS 

Inferred  facts  pertain  to  implicit  relationships  which  the 
data  base  system  may  arrive  at  through  certain  levels  of  indi¬ 
rection.  In  other  words,  a  path,  although  indirect,  does  exist 
which  leads  to  the  desired  data  in  storage.  There  are  two  ways 
by  which  the  system  on  its  own  can  support  this  kind  of  virtual 
information.  The  first  method  is  by  an  exhaustive  search  of  all 
possible  paths,  and  the  second  is  the  application  of  a  certain 
degree  of  artificial  intelligence  to  deduce  a  viable  path  to 
the  target  data.  Well,  the  first  method  is  unbounded  in  comput¬ 
ing  time,  and  even  when  a  path  is  found,  it  may  not  be  the 
correct  path;  the  second  method  is  far  fetched  at  this  time. 
Therefore,  we  will  give  our  attention  to  a  different  but  compa¬ 
rable  set  of  inferred  facts  which  is  implementable,  and  we  give 
it  the  name  Pseudo  Inferred  Facts.  Pseudo  Inferred  Facts  are 
exactly  the  same  as  inferred  facts  except  that  all  the  indi¬ 
rections  will  be  explicitly  designated  by  the  user.  With  this 
strategy,  exhaustive  searche  is  not  necessary,  artificial 
intelligence  is  not  necessary,  and  the  specified  path  would 
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always  be  the  designated  and  correct  path.  For  instance,  the 
Uncle  relationship  may  be  defined  as  the  application  of  the 
Brother  relationship  after  the  application  of  the  Mother 
relationship. 

2.3.0  SPECIFICATION 

Users  of  information,  through  the  virtual  information  facil¬ 
ity,  define  their  own  working  environment  and  the  manner  in 
which  they  would  like  to  use  the  physical  and  underlying  data. 
Such  definitions  of  virtual  information  may  be  accomplished 
through  a  virtual  information  definition  language.  The  virtu¬ 
al  information  facility  would  accept  virtual  information 
definitions  and  their  modifications  in  the  definition 
language,  and  respond  to  virtual  information  retrieval 
requests  through  a  separate  virtual  information  retrieval  lan¬ 
guage  . 

2.4.0  MERITS 

There  are  several  major  merits  in  the  support  of  virtual 
information  in  a  data  base  system.  It  is  dynamic  in  nature 
because  its  definition  may  be  created,  deleted,  and  modified 
readily;  its  definition  applies  to  all  instances  of  data  where 
it  may  apply,  and  yet  there  is  but  only  one  copy  of  this  defi¬ 
nition  stored  in  the  system.  By  facilitaing  the  ease  of 


modification,  it  enhances  data  base  flexibility,  by  eliminat¬ 
ing  redundant  physical  records,  it  contributes  to  more 
consistent  data,  and  by  being  procedural  in  nature,  it  enhances 
information  accuracy  through  the  delay  in  the  evaluation  of 
data  which  vary  over  time  or  other  changing  factors  until  their 
time  of  use.  These  kinds  of  merits  are  based  on  virtual  infor¬ 
mation's  association  with  procedural  relationships.  For 
instance.-  the  stored  algorithm  for  computing  age  would  elimi¬ 
nate  the  need  to  update  the  age  attribute  day  by  day  if  it  were 
physically  stored,  and  would  be  applied  to  calculate  anyone's 
age,  thus  eliminating  redundancy  of  stored  information. 

Virtual  information  also  conserves  the  use  of  vast  amounts  of 
physical  storage.  It  makes  unnecessary  the  storage  and 
maintainence  of  those  information  which  may  be  derived  upon 
request.  This  raises  the  issue  of  Time/Space  trade-off,  which 
should  be  seriously  considered  when  deciding  which  kinds  of 
fundamental  data  are  or  are  not  to  be  physically  stored.  Deri¬ 
vation  upon  requests  will  have  the  added  cost  of  derivation; 
therefore,  those  information  which  will  be  used  many  times  and 
are  also  difficult  to  derive  may  be  the  best  kind  of  data  to  be 
physically  stored;  those  information  which  is  seldomly  used 
and  easy  to  derive  may  be  the  best  kind  of  data  not  to  be  phys¬ 
ically  stored.  Furthermore,  the  situation  is  made  even  more 
complex  as  we  realize  that  the  definitions  themselves  will 
require  the  use  of  physical  storage.  Thus,  it  wouldn't  be  an 
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easy  task  to  decide  which  kinds  of  data  are  to  be  derived,  or  to 
be  actually  stored. 

The  definition  of  virtual  information  on  a  per  user  basis 
would  simulate  an  entire  virtual  data  base  for  each  individual 
user.  Each  one  would  be  free  to  tailored  the  data  base  to  his 
own  preferred  view  or  use  through  the  virtual  information  defi¬ 
nitions.  A  particular  set  of  virtual  definitions  may  be  very 
useful  for  one  group  of  users,  and  another  set  for  another 
group  of  users.  In  this  sense,  each  one  has  gotten  a  data  base 
suited  for  his  own  use  while  not  affecting  anybody  else' s  usage 
of  the  data  base.  A  logical  extension  of  this  scenario  is  to 
implement  access  control  mechanisms  such  that  users  may  estab¬ 
lish  a  controlled  sharing  of  sets  of  virtual  information 
definitions  with  one  another;  the  data  base  administrator  may 
monitor  all  such  sharing  to  prevent  unauthorized  access  to  a 
certain  set  of  virtual  information  functions.  However,  in  a 
scenario  as  such,  a  separate  catalogue  would  have  to  be  main¬ 
tained  for  each  and  every  user,  and  considerable  catalogue 
management  would  be  required.  Such  is  the  cost  for  this  indi¬ 
vidually  user-tailored  data  base  functionality,  a  secondary 
merit  of  the  use  of  Virtual  Information. 


2.5.0  APPROACH 


The  concept  of  virtual  information  leads  directly  to  a  func¬ 
tional  approach  to  data  bases.  A  virtual  information  facility 
would  be  treated  as  a  collection  of  functions,  and  retrieved 
data  would  be  regarded  as  functional  values.  Virtual  informa¬ 
tion  requests  correspond  to  function  invocations;  this  func¬ 
tional  approach  to  information  readily  supports  procedural 
relationships  on  which  based  the  concept  of  virtual  informa¬ 
tion.  As  a  result,  a  virtual  information  facility  is  likely  to 
resemble  very  much  a  language  interpreter  which  accepts  func¬ 
tional  definitions  and  respond  to  functional  invocations  with 
specified  arguments. 


22 


3.0.0  FUNCTIONALITIES 


There  are  numerous  functionalities  to  a  virtual  information 
facility,  each  of  which  may  be  implemented  to  a  varying  degree 
of  completeness.  Although  it  may  be  desirable  to  implement  all 
the  functionalities  there  are  wherever  possible,  it  may  be  too 
impractical  and  less  than  meaningful  for  the  initial  version  of 
the  implementation.  Thus,  we  have  not  implemented  the  One  Data 
Base  per  user  feature  of  virtual  information  capabilities 
which  we  have  described  in  the  previous  chapter.  Later 
portions  of  this  chapter  would  describe  the  functionalities  of 
virtual  information  which  we  did  implement;  surely,  not  all  of 
these  implementations  would  be  without  room  for  further 
refinement,  even  though  they  already  include  an  extensive  set 
of  virtual  information  capabilities. 

3.1.0  UNDERLYING  DATA  MODEL 

The  virtual  information  facility  lies  on  top  of  the  entity 
set  level  of  the  functional  hierarchy.  In  this  level,  the  data 
base  is  seen  as  a  network  of  entity  sets  and  their  attributes. 
Each  entity  set  may  have  a  varying  number  of  attributes,  some 
of  them  being  value  attributes  and  others  being  entity  attri¬ 
butes.  (Hsu  1980)  The  value  attributes  include  a  set  of 
attribute  values,  and  the  entity  attributes  represent 
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relationships  leading  to  other  entity  sets  Figure  3.1  briefly 
illustrates  this  model  . 
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Fig  3 . 1 
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3.2.0  ACTIVE  WORKSPACE 


We  have  developed  an  active  workspace  which  incorporates  a 
line  editor  with  full  screen  display,  through  which  user  com¬ 
mands  may  be  issued.  The  workspace  consists  of  two  buffers,  an 
execution  buffer,  and  a  transaction  buffer.  The  transaction 
buffer  witholds  many  data  base  statements  which  will  be  exe¬ 
cuted  sequentially  when  the  transaction  buffer  is  executed. 
The  execution  buffer  holds  a  single  data  base  statement  and 
will  be  automatically  executed  when  a  data  base  statement  is 
completed.  A  number  of  buffer  commands  is  created  to  manipulate 
buffer  contents.  The  details  of  these  commands  as  well  as  the 
data  base  statements  will  be  illustrated  in  chapter  5. 
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3.3.0  PERMANENTLY  DEFINED  VIRTUAL  INFORMATION 


Permanent  virtual  information  may  be  defined  through  the 
Define  statement.  Such  definitions  will  be  stored  in  a  global 
dictionary,  or  so  called  catalogue,  in  the  form  of  character 
string,  and  will  remain  there  until  explicitly  removed  or 
over-written  by  a  different  definition.  Examples  may  be  found 
within  chapter  5. 

3.4.0  ADHOC  VIRTUAL  INFORMATION 

Virtual  information  definitions  may  be  derived  for  only  the 
duration  of  a  single  transaction.  When  all  statements  within 
the  transaction  are  executed,  the  adhoc  dictionary  would  be 
erased.  Within  the  transaction,  adhoc  definition  may  be  cre¬ 
ated,  deleted,  as  well  as  modified  at  any  time.  With  this  fea¬ 
ture,  each  transaction  would  be  associated  with  a  catalogue  of 
its  own,  and  would  not  interfere  with  the  concurrent  activities 
of  other  transactions  executing  in  parallel.  At  this  stage,  we 
do  not  support  concurrent  transactions,  but  adhoc  definition 
capability  is  still  useful  in  the  principle  of  transactions. 
Surely,  the  permanent  dictionary  would  also  be  accessable  from 
within  each  transaction. 
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3.5.0  NOTION  OF  A  TRANSACTION 

A  transaction  is  a  body  of  executable  statements  joined 
together  within  a  single  context.  This  context  is  provided  by 
the  adhoc  dictionary  associated  to  the  particular  transaction. 
A  transaction  is  created  within  the  transaction  buffer,  and 
will  remain  there  until  it  is  explicitly  over-written,  erased, 
or  executed.  Merits  of  this  transaction  concept  are  threefold: 
a)  a  group  of  statements  which  collectively  does  a  certain  task 
may  be  consolidated  to  exhibit  logical  unity,  b)  a  shared  con¬ 
text  may  be  created  and  maintained  for  each  transaction,  a  sign 
of  transactional  modularity  and  independence  from  one  another, 
c)  the  execution  of  the  consolidated  operations  in  a  trans¬ 
action  may  be  put  off  until  a  more  opportune  moment,  by  which 
time  new  permanent  or  adhoc  virtual  information  definitions 
may  be  defined  either  to  supplement  or  to  replace  existing 
definitions. 

3.6.0  VIRTUAL  ATTRIBUTES 

Virtual  attributes  equated  to  the  results  of  computational 
algorithms  acting  on  available  data  or  of  designated  indirect 
references  may  be  explicitly  defined  through  the  Define  data 
base  statement.  This  feature  incorporates  the  support  for  Com¬ 
puted  Facts  as  well  as  for  Pseudo  Inferred  Facts.  For  instance, 
the  following  is  the  definition  and  usage  of  two  virtual  attri- 
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butes,  income  and  ship-country,  a  computed  fact,  and  a  pseudo 
inferred  fact. 

Define  income  as  salary  -  expenses  ; 

Retrieve  ({teachers})  by  ({VO}  name, income)  ; 

The  foregoing  retrieve  statement  returns  two  vertical  col¬ 
umns  of  data.  The  first  column  being  teacher’s  name,  and  the 
second  column  being  their  corresponding  incomes. 

Define  ship-country  as  I  company  (  country  (name  ))Q  ; 

Retrieve  ({ship})  by  ({vO}  name,  ship-country)  ; 

This  foregoing  retrieve  statement  returns  two  columns  of 
data,  the  first  being  individual  ship  names,  and  the  second 
being  the  name  of  the  country  to  which  the  ship  belongs  to.  The 
entity  diagram  for  this  scenario  is  as  follows: 
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Fig  3.2 


3.7.0  CONDITIONS  ON  REAL  OR  VIRTUAL  ATTRIBUTES 


Arbitrary  conditions  on  real  or  virtually  defined  attributes 
may  be  defined  by  INFOPLEX  users  as  the  shared  'condition'  on 
their  data  values  from  which  factored  facts  may  be  later  con¬ 
structed.  For  example: 

Define  old  as  age  >  70  ; 

Define  rich  as  assets  >  1000000  ; 

Retrieve  ( {people}where ( rich  and  old))  by  ((VO)  name); 

The  foregoing  retrieve  statement  would  return  a  list  of  names 
of  those  people  whose  age  >  70  and  assets  >  10C0000. 

3.8.0  VIRTUAL  ENTITY  SETS 

Aside  from  virtual  attributes,  we  also  support  a  basic  notion 
of  virtual  entity  sets.  We  recognize  two  kinds  of  virtual  enti¬ 
ty  sets: 

a)  Union  or  intersection  of  real  or  previously  defined  virtual 
entity  sets  based  on  their  real  and  virtual  attribute  values. 

b)  Subsetting  of  real  or  virtual  entity  sets  based  on  certain 
conditions  on  their  real  and  virtual  attribute  values. 
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For  instance: 


Define  ClassAB  as  {ClassA}  MU  (Name)  {Class3}  ; 

ClassAB  is  defined  as  the  result  of  a  multiple-union  opera¬ 
tion  on  entity  sets  ClassA  and  ClassB,  based  on  a  common  attri¬ 
bute  called  Name. 

Define  RichMen  as  {Men}  where  (assets  >  1000000)  ; 

RichMen  is  defined  as  a  virtual  subset  of  the  set  Men,  based 

on  the  values  of  its  asset  attributes. 

The  complete  set  of  union  and  intersection  operators  as  well 
as  the  cartesian  product  operator  between  entity  sets  is  illus¬ 
trated  within  chapter  5.  Also,  refer  to  chapter  5  for  details 
of  the  capability  to  specify  various  conditional  predicates  on 
attribute  values. 

3.9.0  GENERALIZED  MACRO  FACILITY 

Users  will  be  able  to  define  arbitrary  definitions  and  to 
give  them  specific  names  by  which  the  definitions  may  be 
referred  to  and  later  substituted  into  data  base  statements. 
In  this  sense,  the  define  statement  may  be  used  not  only  to 
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define  virtual  attributes,  virtual  entity  sets,  but  also  ran¬ 
dom  definitions  as  well  even  if  the  definitions  are  seemingly 
incoherent  without  the  proper  context.  When  a  retrieval  state¬ 
ment  is  to  be  executed,  all  words  within  the  statement  are 
first  checked  against  a  list  of  stored  definition  names;  any 
matching  definition  would  be  recalled  from  the  dictionary  and 
put  in  the  place  of  the  matching  definition  name  in  the 
retrieval  statement.  Chapter  5  includes  a  detailed  description 
of  such  usage . 

3.10.0  EXTENDABLE  FUNCTIONALITIES 

3.10.1  'USER  DEPENDENT  VIRTUAL  DEFINITIONS 


This  particular  functionality  is  not  difficult  to  implement, 
but  it  may  be  unnecessary  at  this  stage  of  the  project.  It  sim¬ 
ply  would  require  a  separate  catalogue  for  each  user  which 
includes  an  access  control  list,  proper  search  rules  including 
default  situations,  and  adequate  coordination  and  control 
mechanisms  to  manage  the  various  catalogues.  It  would  increase 
the  cost  in  terms  of  time  and  space  efficiency.  Thus,  we  have 
not  included  this  functionality  in  this  version  of  virtual 
information  implementation.  Nevertheless,  if  circumstances  in 
later  time  are  such  that  the  support  for  user  dependent  cata¬ 
logues  is  so  desirable  as  to  more  than  compensate  for  its  cost 
of  implementation,  this  functionality  may  be  added  readily. 
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3.10.2 


INFERRED  FACTS  OF  UNDESIGNATED  INDIRECTION 


Inferred  facts  with  undesignated  indirection,  rather  than 
pseudo  inferred  facts  with  designated  indirection,  is  likely 
to  have  tremendous  costs  in  system  performance  whenever  it  is 
to  be  implemented.  As  previously  stated,  this  would  require 
either  an  exhaustive  search  or  a  certain  level  of  artificial 
intelligence,  both  of  which  require  large  amounts  of  resources 
in  computing  power,  storage  and  time.  Furthermore,  in  order  to 
verify  that  the  indirection  the  system  chooses  at  each  step 
along  the  way  is  correct,  the  user  has  to  monitor  the  computer 
decisions  interactively;  this  defeats  the  original  purpose  of 
not  having  the  user  to  designate  his  intended  path  of  indi¬ 
rection.  Thus,  it  seems  very  doubtful  that  this  functionality 
will  ever  be  implemented  unless  the  requirements  for  user  moni¬ 
toring  of  the  decision  process  is  somehow  eliminated. 
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4.0.0  PROGRAM  STRUCTURE 


Our  implementation  is  done  with  special  attention  to  modu¬ 
larity.  Each  primary  module  incorporates  numerous  internal 
sub-modules  whose  very  existence  are  not  known  nor  relevant  to 
the  implementation  of  other  primary  modules.  Aside  from  the 
PL/1  modules,  we  have  designed  a  data  base  language,  and  a 
finite  state  push-down  automaton,  each  of  which  will  be  cate¬ 
gorized  as  a  single  module  as  well.  Figure  4.1  illustrates  the 
control  structure  and  data  flow  of  all  the  modules  in  our 
implementation.  Single  arrow  heads  in  the  diagram  represent 
control  structure  transitions  and  double  arrow  heads  represent 
data  flow. 

4.1.0  MODULE  DESCRIPTION 

The  front-end  as  designed  and  implemented  in  this  thesis 
includes  the  following  modules: 

(1)  USER- INTERFACE 

( 2 )  BUFFER 

( 3 )  ACTIVITY- COORD I NATOR 

(4)  TOKENIZER-PROCESSOR 

(5)  LANGUAGE  DESIGN  and  SPECIFICATION 

(6)  FINITE-STATE-PUSH-DOWN  AUTOMATON  (MACHINE) 
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All  programs  are  written  in  PL/1  under  CMS  operating  system 
on  IBM  VM/370 . 

4.1.1  USER- INTERFACE 


This  module  is  named  USINT  as  a  PL/1  program.  It  is  currently 
the  options-main  program  of  the  entire  virtual  information 
facility.  It  diverts  control  to  one  of  two  INFOPLEX  implemen¬ 
tations,  one  of  which  includes  our  virtual  information  facili¬ 
ty,  and  the  other  does  not.  Once  the  implementation  with 
virtual  information  facility  is  selected  by  the  user,  this  mod¬ 
ule  serves  as  the  communication  link  between  the  BUFFER  module 
which  interacts  with  the  user  and  the  ACTIVITY  COORDINATOR  mod¬ 
ule  on  the  lower  level  which  supervises  the  execution  of  data 
base  statements. 

This  module  has  five  internal  routines: 

(1)  XBUFF 

(2)  GETS 

(3)  REPLACE  (internal  to  RETDSPLY) 

(4)  RETDSPLY 

The  XBUFF  routine  strips  individual  executable  data  base 
statements  one  by  one  off  the  buffer,  and  pass  them  down  to  the 
next  level  for  further  processing. 
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The  GETS  routine  is  a  generalized  tool  which  actually  does  a 
substring  command  from  the  first  character  of  a  given  string  to 
the  first  occurrence  of  a  given  character.  After  the  execution 
of  this  routine,  the  portion  of  the  original  string  up  to  and 
including  the  given  character  would  be  eliminated. 

For  instance: 

strl  =  * abc$def ' 

Gets  ( strl, ' $ ' )  would  return  ' abc ' and  the  value  of  strl 
becomes  'def'  . 

The  REPLACE  routine  is  also  a  generalized  tool  to  replace  all 
occurrences  of  a  given  varying  character  string  of  length  two 
or  less,  by  another  varying  character  string  of  length  two  or 
less . 

For  instance : 

strl  =  ' abc, def ' 

Replace  ( strl 55 '  )  would  change  the  value  of  strl  to 
' abc55def ’ 

The  RETDSPLY  routine  simply  displays  the  current  retrieval 
statement  which  is  being  processed  to  indicate  the  correspond¬ 
ence  of  subsequent  outputs  to  this  particular  statement.  It 


makes  numerous  calls  to  REPLACE  because  many  characters  have 
been  previously  translated  to  enable  the  application  of  the 
finite  state  machine. 

4.1.2  BUFFER 


The  BUFFER  module  is  named  NEWBUF  as  a  PL/1  program.  It  con¬ 
tinuously  interacts  with  the  user  during  a  virtual  information 
session.  It  has  a  transaction  buffer  which  corresponds  to  the 
"transaction"  concept  of  virtual  information,  and  v/hich  would 
accumulate  successive  data  base  statements  until  the  entire 
transaction  is  to  be  executed.  It  also  has  an  execution  buffer 
which  would  be  automatically  executed  upon  the  completion  of  a 
single  executable  data  base  statement.  The  word  "execution"  in 
the  context  of  this  module  simply  means  the  return  of  control 
to  the  module  which  called  it,  USER- INTERFACE .  When  returning 
control  to  the  caller,  if  the  transaction  buffer  is  to  be  exe¬ 
cuted,  then  the  transaction  buffer  content  will  be  moved  to  the 
execution  buffer,  and  if  user  requested  the  termination  of  the 
virtual  information  session,  then  a  control  bit  passed  to  it 
from  USER- INTERFACE  would  be  set. 

In  order  to  facilitate  the  using  of  virtual  information,  this 
module  incorporates  a  full  screen  but  simple  line  editor  which 
is  coupled  with  the  existing  buffer  commands.  Buffer  commands 
enable  the  moving  of  data  from  buffer  to  buffer,  execution  of 
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either  buffer  content,  input  of  buffer  content  from  a  CMS  file, 
saving  of  buffer  content  to  a  CMS  file,  and  termination  of  the 
active  session.  The  editor  commands  are  INSERT,  DELETE,  and 
TOPLINE;  they  enable  a  simple  editing  of  the  transaction  buffer 
content.  The  usage  of  these  editor  commands  and  buffer  commands 
is  described  in  Chapter  5. 

In  essence,  this  module  establishes  the  Active  Workspace 
environment  described  earlier.  It  is  the  primary  module  of  the 
external  level  of  the  functional  hierarchy,  developed  specif¬ 
ically  for  the  use  of  virtual  information  facility  on  the  next 
lower  level . 

This  module  has  the  following  internal  routines: 

( 1 )  LDSPCH 

(2)  BUILDBUFF 

( 3 )  TRNSLATE 

(4)  FINPUT 

( 5 )  KBLKS 

( 6 )  GETS 

( 7 )  NEXTWORD 

(8)  RMVFBLKS  (internal  to  NEXTWORD) 

( 9 )  FSAVE 

(10)  WAIT 

(11)  REPLACE 

(12)  HELP 
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(13) 

SETMKS 

(14) 

BDI SPLAY 

(15) 

DELTE 

(16) 

WTHNLM 

The  LDSPCH  routine  contributes  to  the  format  integrity  of 
each  user  inputted  line  by  constructing  a  header  which  is  con¬ 
catenated  to  the  front  of  each  line.  The  header  begins  with  a 
character,  which  is  suceeded  by  a  numeric  character  string 
representing  the  number  of  leading  blank  characters  in  this 
line,  and  ends  with  a  "  : 11  character.  In  this  manner,  all  lead¬ 
ing  blank  characters  of  each  line  may  be  removed.  The 
advantages  of  using  such  a  header  are  twofold;  not  only  can 
storage  be  conserved,  but  also  a  fixed  structure  be  imposed  on 
all  user  inputs  to  reduce  complexity. 

The  BUILD3UF  routine  constructs  either  the  execution  or  the 
transaction  buffer  one  line  at  a  time  from  each  user  input 
line.  Markers  on  either  buffer  is  repositioned  to  enable  prop¬ 
er  editor  display.  It  returns  a  boolean  value  of  "true"  if 
there  is  at  least  one  completed  data  base  statement  in  the  cur¬ 
rent  buffer. 


The  TRNSLATE  routine  checks  for  missing  quote  terminators 
for  character  string  constants  and  back-slash  terminators  for 
comment  lines.  It  also  translates 
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characters  within  com- 


ments  to  "%2"  and  consecutive  single  quote  characters  within 
character  string  constants,  representing  an  actual  quote  char¬ 
acter,  to  "%1".  Such  translations  are  necessary  to  avoid 
ambiguity  and  complications  in  the  input  recognition  stages  of 
the  process . 


The  FINPUT  routine  serves  to  input  transaction  buffer  con¬ 
tent  from  a  CMS  file  whose  file  name  is  "file"  and  file  type  is 
given  by  the  user  through  the  "finput"  buffer  command.  Ori¬ 
ginal  transaction  buffer  content  is  erased.  Characters 

are  replaced  by  "%4" , "%0" , "%3" , "%5"  so  that 
they  would  not  interfere  with  finite  state  machine  command  lan¬ 
guage  . 


The  K3LKS  routine  serves  to  remove  all  leading  blank  charac¬ 
ters  from  the  current  input  line,  and  also  to  keep  the  number  of 
them  removed  in  variable  "ldspaces". 


The  GETS  routine  is  a  general  tool  as  described  earlier  with¬ 
in  the  USER- INTERFACE  module. 

The  NEXTWORD  routine  returns  the  sequence  of  characters  in 
the  input  line  up  to  but  not  including  the  first  blank  charac¬ 
ter.  If  a  blank  character  is  not  found,  the  entire  input  line  is 
returned.  Comments  are  automatically  removed  and  are  not 
recognized  as  part  of  an  input  line. 
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The  RMVFBLKS  routine  is  internal  to  NEXTWORD;  it  serves  to 
remove  the  blank  characters  preceding  each  word  in  the  input 
line.  Its  name  stands  for  remove- f  ront-blanks  . 

The  FSAVE  routine  is  the  counter  part  to  the  FINPUT  routine. 
It  writes  the  current  transaction  buffer  content  into  a  CMS 
file  whose  file  name  is  "file"  and  file  type  is  given  by  the 
user  through  the  "fsave"  buffer  command.  The  characters  trans¬ 
lated  by  FINPUT  and  TRNSLATE  routines  are  restored  before  they 
are  written  into  the  CMS  file. 

The  WAIT  routine  serves  as  a  time  delay  to  hold  messages  to 
user  on  display  terminals  long  enough  to  be  readable  by  the 
human  eye. 

The  REPLACE  routine  is  a  general  tool  as  described  earlier 
within  the  USER- INTERFACE  module. 

The  HELP  routine  displays  a  brief  explanation  of  each  buffer 
command  to  the  display  terminal. 

The  SETMKS  routine  sets  or  resets  markers  in  either  the  exe¬ 
cution  or  the  transaction  buffer  for  buffer-display  purposes 
of  the  full-screen  line-editor.  The  name  "setmks"  stands  for 
" set-markers" . 
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The  BDISPLAY  routine  serves  to  display  the  contents  of  both 
the  execution  and  the  transaction  buffer.  It  implements  the 
full-screen  characteristic  of  our  line-editor. 

The  DELTE  routine  implements  the  delete-line  function  of  the 
editor.  The  transaction  buffer  markers  are  properly  reset 
after  each  invocation  of  this  routine. 

The  WTHNLM  routine  serves  to  verify  the  logical  correctness 
of  editor  command  correctness.  If  the  parameters  are  out  of 
current  buffer  boundaries,  then  the  routine  will  return  'O'B  . 

4.1.3  ACTIVITY  COORDINATOR 

This  module  coordinates  all  activities  on  the  level  of  virtu¬ 
al  information  processing.  It  directs  the  moving  of  program 
control  through  various  modules  on  this  level.  A  number  of 
debugging  tools  which  prints  out  various  trees,  token  chains, 
and  tables  are  included  within  this  module,  and  can  be  used  in 
times  of  need  by  inserting  a  "call"  statement  any  where  within 
the  module . 

This  module  contains  the  following  internal  routines: 

( 1 )  GETS 

(2)  PRINTT 
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(3)  PRINTM 

(4)  PRINTX 

(5)  PRINTE 

(6)  PRINTR 

The  GETS  routine  is  a  general  tool  as  described  earlier  with¬ 
in  the  USER- INTERFACE  module. 

The  PRINTT  routine  is  a  debugging  tool  which  can  be  used  to 
print  the  chain  of  input  tokens. 

The  PRINTK  routine  is  a  debugging  tool  which  can  be  used  to 
print  a  snap  shot  of  the  finite  state  machine. 

The  PRINTX  routine  is  a  debugging  tool  which  can  be  used  to 
print  the  execution  tree. 

The  PRINTE  routine  is  a  debugging  tool  which  can  be  used  to 
print  the  entity  set  table. 

The  PRINTR  routine  is  a  debugging  tool  which  can  be  used  to 
print  the  revised  entity  set  table. 

4.1.4  T0KENI2ER-PR0CESS0R 
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This  module  serves  to  tokenize  retrieval  statements,  and  to 
execute  "define"  ,  "adhoc",  and  "listdef"  statements.  It  is  the 
only  module  of  the  virtual  information  facility  which  communi¬ 
cates  with  the  dictionary  of  virtual  information  definitions, 
besides  USER- INTERFACE  which  makes  one  call  to  dictionary  for 
initialization.  When  tokenizing  each  retrieval  statement,  vir¬ 
tual  definitions  are  recalled  from  the  dictionary  whenever 
appropriate  and  substituted  directly  into  the  retrieval  state¬ 
ment. 

This  module  contains  the  following  internal  routines: 


(1) 

GETS 

(2) 

NXTKSTR 

(3) 

RMVFBLKS 

( internal 

to 

NXTKSTR ) 

(4) 

TOK1 

( internal 

to 

NXTKSTR) 

(5) 

DEF 

(6) 

BDTKCHN 

(7) 

MSG 

(8) 

LISTDEF 

(9) 

DEFDSPLY 

( internal 

to 

LISTDEF) 

(10) 

REPLACE 

The  GETS 

routine  is 

a  general 

tool  as  described 
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the  USER- INTERFACE  module. 


The  NXTKSTR  routine  is  the  core  of  the  tokenizing  process;  it 
recognizes  from  the  input  stream  the  next  token  in  the  form  of  a 
character  string.  Each  token  is  a  separately  recognizable 
entity.  This  routine  is  called  repeatedly  by  the  BDTKCHN  rou¬ 
tine  which  builds  an  entire  chain  of  tokens. 

The  RMVF3LKS  is  the  same  routine  as  described  in  the  BUFFER 
module . 

The  T0K1  routine  is  the  main  body  of  the  NXTKSTR  routine.  It 
recognizes  the  next  portion  of  the  input  string,  which  is  to  be 
transformed  into  a  separate  token. 

The  DEF  routine  serves  to  execute  the  "define"  and  "adhoc" 
data  base  statements.  It  creates  and  modifies  virtual  informa¬ 
tion  definitions  in  the  dictionary  of  virtual  definitions. 

The  BDTKCHN  routine  builds  an  entire  chain  of  linked  tokens. 
Each  retrieval  statement  is  transformed  to  such  a  chain  of  sep¬ 
arate  tokens  before  further  processing. 

The  MSG  routine  outputs  a  message  line  to  the  terminal  and 
prompts  the  user  to  press  the  "enter"  key  to  continue. 


The  LISTDEF  routine  executes  "listdef"  data  base  statements. 
It  would  recall  the  definition  in  the  dictionary  v/hich  is  to  be 
listed,  and  output  the  definition  to  the  terminal. 

The  DEFDSPLY  routine  is  internal  to  the  LISTDEF  routine.  It 
serves  to  process  a  stored  definition  for  terminal  display. 
Retranslation  is  needed  to  reconstruct  those  original  charac¬ 
ters  which  have  been  previously  translated. 

The  REPLACE  routine  is  a  general  tool  as  described  in  the 
BUFFER  module. 

4.1.5  LANGUAGE  DESIGN  and  SPECIFICATION 

This  module  incorporates  the  design  and  formal  specification 
of  a  data  base  language  which  defines  and  retrieves  virtual 
information.  Chapter  5  is  devoted  exclusively  to  explaining 
and  describing  this  module. 

4.1.6  FINITE  STATE  AUTOMATON  (MACHINE) 

This  module  serves  to  parse  the  data  base  statements  written 
in  the  language  illustrated  in  Chapter  5.  It  consists  of  a  set 
of  Match-Action-Nextstate  rules  which  is  one  of  the  inputs  to  a 
generalized  parse  program  written  by  Peter  Lu.  A  change  in  the 
grammar  of  our  data  base  language  readily  corresponds  to 
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changes  in  the  rules  of  this  finite  state  machine;  thus,  free¬ 
ing  us  from  changing  the  parse  program  itself.  Chapters  is 
devoted  exclusively  to  explain  the  workings  of  these  rules. 


4.2.0  INTERNAL  GLOBAL  VARIABLES 

USER- INTERFACE  MODULE: 

execbuff  --  execution  buffer 
trnsbuff  --  transaction  buffer 

firstlast  --  passed  to  BUFFER  and  used  to  indicate  when 
to  terminate  end  of  session, 
line  --  used  to  hold  user-input  line 

BUFFER  MODULE 

execbuff,  trnsbuff,  firstlast  (same  as  in  USER- INTERFACE) 
prstline  --  current  input  line,  char(80) 

strnsp  --  prstline,  stripped  of  leading  and  ending  spaces 
cplnvar  --  strnsp,  char (80)  varying 
ldspaces  --  number  of  leading  spaces  on  input  line 
key  --  current  input  word  to  be  investigated 

ACTIVITY-COORDINATOR  MODULE 


DICTIONARY 

ENTITY 

TOKEN 

MACH 

XTREE 

XCHNGE 


virtual  information  dictionary 

entity  set  representation 

token  representation 

finite  state  machine  representation 

execution  tree  representation 

entity  set  table  representation 


UNIT 

TKLSPTR 

GO 


--  current  data  base  statement  to  be  processed 
--  pointer  to  the  list  of  input  tokens 
--  indicator  to  proceed  with  beyond  the 
tokenizer-PROCESSOR  stage 


TOKENIZER- PROCESSOR  MODULE: 


unit 

tklsptr 

diction 

go 

kind 

word 


--  current  data  base  statement 
--  pointer  to  list  of  input  tokens 
--  dictionary 

--  indicator  to  continue  processing 
(set  only  for  retrieval  statements) 

--  numeric  indicator  for  arithmatic  and 
string  constants. 

--  first  word  of  data  base  statement 
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5.0.0  LANGUAGE  ILLUSTRATION  AND  SPECIFICATION 


This  section  contains  an  illustration  and  a  formal 
specification  of  the  data  base  language  implemented  on 
the  virtual  information  facility,  as  well  as  the  buffer 
commands  which  are  implemented  to  provide  an  interactive 
environment  in  which  virtual  information  processing 
may  be  continued. 

5.1.0  DATA  BASE  STATEMENTS 

5.1.1  DEFINE  STATEMENTS 

A  user  may  define  the  character  string  x  to  be  a 
macro  definition  of  the  character  string  y  by  the 
following  statement: 

define  x  as  y  ; 
def  x  as  y  ; 

For  instance: 

define  currentyear  as  birthyear  +  age  ; 

define  old  as  age  >  60  ; 

define  employee  as  {worker}  ; 

def  a  as  2+3+4/5*8  ; 

def  inches  as  2.54  *  centimeters  ; 

Using  define  statements  to  simulate  functions: 

define  sum#a#b  as  #a  +  #b  ; 
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This  specifies  a  function  with  two  arguments,  #a  and  #b. 

The  value  of  sum#2#4  when  evaluated  would  be  6. 

To  remove  previously  defined  definitions: 

define  age  remove  ; 
define  age  rem  ; 
define  a  remove  ; 

5.1.2  ADHOC  STATEMENTS 

Adhoc  statements  are  similar  to  defines  statements; 
the  only  difference  lie  in  the  target  catalogue 
identity.  Adhoc  statements  operate  on  the  adhoc 
dictionary,  and  define  statements  operate  on  the 
permanent  dictionary. 

adhoc  x  as  y  ; 

adhoc  curentyear  as  1982  ; 
adhoc  sgfhg  as  kruilko  ; 
adhoc  age  rem  ; 

adhoc  avg#x#y#z  as  ( #x  +  #y  +  #z )  /  3  ; 

In  both  define  and  adhoc  statements,  virtual  definition  may 
be  defined  on  top  of  other  virtual  definitions;  in  our 
implementation,  we  allow  a  maximum  of  10  nested  levels  of 
virtual  information  definition.  Thus,  if  one  defines  a 
recursive  definition,  our  system  would  terminate  the 
entire  process  of  replacing  definition  names  by  their 

associated  definitions  by  the  eleventh  attempt  in  replacing 
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the  same  definition  name. 


5.1.3  LISTDEF  STATEMENTS 

These  statements  list  the  stored  definitions  in  the 
dictionary  by  name;  the  search  order  is: 
adhoc  -->  permanent. 

listdef  age  ; 
listdef  employee  ; 
listdef  sgfhg  ; 

5.1.4  RETRIEVE  STATEMENTS 

Our  retrieve  statements  are  powerful  enough  to 
retrieve  the  following  kinds  of  virtual  information 
from  either  real  or  virtual  entity  sets: 

a)  computed  facts 

b)  implied  facts 

c)  factored  facts 

Computed  facts  are  those  information  derived  from  an 
algorithmic  computation  on  existing  data;  implied  facts 
are  those  information  derived  from  indirect  associations; 
factored  facts  are  those  instances  of  a  particular 
group  of  facts  which  share  a  certain  condition  on  their 
attribute  values. 

We  support  the  following  kinds  of  computational 
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operators  with  four  levels  of  precedence,  left  to  right 
within  each  level  of  precedence,  and  together  with 
parenthesized  precedence  capability: 

+  ,  -  ,  |  ,  -  lov/est  order  of  precedence 

(plus,  minus,  and  concatenate) 

(binary  infix  operators) 

*  ,  /  ,  -  next  order  of  precedence 

(multiplication  and  division) 

(binary  infix  operators) 

!  -  next  order  of  precedence 

(exponentiation  operator) 

+  ,  -  ,  ---  highest  order  of  precedence 

(arithmatic  pre-operators) 

Aside  from  these  built-in  operators,  we  also  support  a 
number  of  built-in  functions  as  enumerated  below: 

functions  with  no  arguments: 

date  usage  -->  nextdate  =  STR  (DATE  ,  2:'$')  +  1  ; 

first,  one  would  use  the  STRING  function  to 
obtain  the  relevant  portion  of  the  value 
returned  by  the  DATE  function,  and  then  this 
value  is  incremented  by  1. 

A  date  in  the  system  may  be  stored  in  the  form 
month$date$year ,  or  any  other  pre-determined 
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manner.  The  STRING  function  is  very  much  suited 
for  the  getting  of  relevant  portions  of  data 
stored  in  this  form. 

Functions  with  one  argument: 

These  functions  operate  on  entire  entity  sets;  in  this 
sense,  they  are  vertical  operators,  not  of  the  unilateral 
kind  which  we  are  usually  familiar  with. 

Any  valid  expression  may  serve  as  an  argument  to 
built-in  functions. 

• 

MAX(y)  usage  max  (length  +  width  +  hight) 

refers  to  that  particular  instance  of  the 
entity  set  whose  dimensions  have  a  greater 
sum  than  all  other  members  of  the  set. 

retrieve  (  {  employee}  where  (  salary  =  max  (salary)  )  ) 
by  (  {vO}  name  )  ; 

gets  the  name  of  the  employee  who  earns 
the  highest  salary. 

SUM(y)  usage  -->  where  (  sum  (  y  )  =  100  ) 

yields  true  if  the  sum  of  all  instances 
of  y  in  the  current  entity  set  equals  100. 

MIN( a+b-5 ) 


for  each  member  of  the  set,  the  value  of 
argument  expression  is  first  calculated. 
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f 


then  the  minimum  of  them  all  is  taken. 


ABS(x+y+z) 

returns  the  absolute  value  of  the  argument 
expression  for  each  member  of  the  entity  set. 

SGN( index)  --  returns  -1  if  argument  is  negative 

returns  0  if  argument  is  zero 
returns  1  if  argument  is  positive 

SUM( X ! 2 ) 

sums  up  the  squares  of  the  variable  x,  yielding 
one  single  value. 

POS(v)  --  returns  boolean  value  for  each 

instance  of  attribute  v  in  the 
entity  set. 

ZER(x)  --  returns  boolean  value  for  each 

instance  of  attribute  x  in  the 
entity  set. 

Functions  of  more  than  one  argument: 

STR  (  b  ,nth  occurrence  of  'x',  mth  occurrence  of  ' y'  ) 

returns  a  substring  of  b  from  the  nth  occurrence  of  'x' 

to  the  mth  occurence  of  ' y'  exclusively. 

usage  -->  STR  (  b  ,  4:'x’  ,  5:'y'  ) 
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a  retrieve  statement  is  of  the  following  basic  form: 

retrieve  (  list  of  real  and/or  virtual  entity  sets 

separated  by  commas  and  each  with  an  optional 
predicate  clause  ) 
by  (  entity  set  designation 

list  of  items  to  be  retrieved  )  ; 

The  first  set  in  the  list  would  be  known  as  { vO > 

The  second  set  in  the  list  would  be  known  as  {vl> 

if  it  ii  it  ii  it  ii  ii  n  ii 

The  tenth  set  in  the  list  would  be  known  as  {v9} 

A  maximum  of  ten  such  sets  on  this  level  is  permitted. 

We  hope  to  demonstrate  the  functionality  of  the 
retrieve  statement  through  the  following  examples: 

retrieve  (  {esl}  )  by  ({VO}  x)  ; 

gets  all  those  "x"  attributes  of  entity  set  "esl"  . 

retrieve  (  {esl}  )  by  (  {vO}  x+3  )  ; 

computes  and  returns  all  x+3  instances  of  entity 
set  ESI  which  has  the  attribute  X. 

retrieve  (  {esl},{es2}  )  by  (  {vO}  max  (x)  ) 

by  (  {vl}  y*4,  min  (y*z)  )  ; 

First,  it  gets  the  instance  of  "esl"  's  attribute  " 
which  has  the  highest  value  of  all  instances  of  "esl" 
attribute  "x", 


Second,  it  gets  the  "y*4"  elements  of  entity  set 
"es2",  y  being  an  attribute  of  "es2"  ,  then  it  gets 
the  instance  of  "es2"  's  "y*z"  which  has  the  minimal 
value  of  all  other  instances  of  "es2"  's  "y*z", 

"y"  and  "z",  both  being  attributes  of  "es2"  . 

retrieve  (  {esl}  where  (  (  (  xl  <  x2  )  and  (  x3  =  x4  )  )  ) 
{es2}  where  (  yl  =  (  1,2,3, 4,5  )  or  y2  =  y3  ) 
{{vO}}  where  (  xl  |  x3  =  str  (  b, 4: ' $ ' , 5 : ' $ '  ))) 
by  (  {v0>  xl,  x3 ) 
by  (  {vl}  yl,  y2 )  ; 

A  complete  set  of  predicate  conditions  on  real  and 
virtual  entity  sets  is  supported  with  ''and"  ,  "or”  , 
"xor"  and  connectors,  with  "<"  ,  ">"  ,  "  =  "  , 

,  and  relators,  the  default  order  of 

precedence  is  from  left  to  right  unless  otherwise 
indicated  by  the  use  of  parentheses. 

For  instance:  the  following  are  equivalent  conditions: 

xl  =  x2  and  x2  >  x3  or  x3  <  x4 

(xl  =  x2 )  and  (x2  >  x3 )  or  (x3  <  x4) 

((xl  =  x2 )  and  (x2  >  x3 ) )  or  x3  <  x4 
(((((xl  =  x2 ) ) ) ) )  and  x2  >  x3  or  (x3  <  x4) 

Each  where  clause  is  attached  to  the  entity  set 
specified  immediately  prior  to  the  clause  itself;  the 


only  restriction  on  the  kinds  of  entity  sets  allowed 
to  have  where  clauses  attached  to  them  is  that  they 
are  not  one  of  the  following: 

{v0}/{vl}({v2},{v3}/{v4> 

{v5},{v6},{v7},{v8},{v9> 

This  is  so  because  these  entity  sets  only  refer 
to  some  other  entity  set  which  was  already  specified. 
According  to  this  principle,  the  following  entity 
sets  may  have  associated  predicates  because  they  are 
themselves  the  specification  of  new  virtual 
entity  sets: 

{{vO}} , {{vl}} , 

. Uv9}} 

The  second  entity  set  in  the  foregoing  retrieve 
statement  has  an  associated  predicate  which  specified 
yl  =  (1,2, 3,4, 5);  this  predicate  requires  yl 
to  be  of  either  one  of  the  constants  within  the 
enclosing  set  of  parenthesis,  however,  when  using 
this  kind  of  comparison,  we  make  the  restriction 
that  all  values  which  appear  in  the  enclosing  set  of 
parentheses  must  be  either  an  arithmatic  constant 
or  a  string  constant. 

The  third  condition  clause  in  the  foregoing 
retrieve  statement  illustrates  the  use  of  the  string 
functions;  the  function  call  is  attempting  to  return 


the  substring  of  b,  from  the  4th  occurence  of 
the  '$'  character  to  the  5th  occurence  of  the  '$' 
character.  The  predicate  would  yield  true  if  the 
results  of  concatenating  xl  and  x3  is  equal  to  the 
retrun  value  of  that  function  call. 

retrieve  (  {  {esl}  mi  (x,y,z)  {es2}  } 
by  (  {vO}  weight  )  ; 

This  statement  retrieves  all  instances  of  the  weight 
attribute  of  the  virtual  entity  set  composed  of  the 
"multiple  union"  of  real  entity  sets  esl  and  es2,  based 
on  the  common  attributes  "x",  "y" ,  and  "z". 

Each  virtual  entity  set,  enclosed  by  a  set  of  left  and 
right  braces,  may  itself  be  composed  of  tv/o  other  virtual 
entity  sets  as  the  result  of  a  set  operation,  and  each  of 
these  two  component  entity  sets  may  also  be  composed 
of  two  other  virtual  entity  sets  as  the  result  of  a 
set  operation,  and  each  of  these  component  entity  sets 
so  on.  In  this  manner,  virtual  entity  sets  may  be 
built  very  quickly  one  on  top  of  another,  each  with  its  own 
set  of  predicate  conditions  to  be  met. 

Five  set  operators  are  supported  between  two  entity 
sets:  they  are,  multiple  union,  multiple  intersection, 
single  union,  single  intersection,  and  cartesian  product; 
namely,  MU,  MI,  SU,  SI,  and  CS.  The  semantics  of  these 
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operators  are  described  in  the  co-thesis  by  Peter  Lu. 
usage  -->  {{esl}  SI  (x)  {es2}> 

{{esl>  su  (y,z)  {es2}} 

{{esl}  MU  (y,z,zl)  {es2}} 

The  operands  of  MU,  MI,  SU,  and  SI  can  be  a  list  of 
attribute  names  separated  by  commas,  but  the  operands 
to  CS  must  be  two  in  number  and  the  first  one  must  be 
preceded  by  a  "  "  sign  to  indicate  its  cartesianess . 

{{esl}  cs  (id, class)  {es3}> 

An  arbitrary  WHERE  clause  representing  a  predicate 
condition  may  follow  each  and  every  kind  of  prescribed 
virtual  entity  set. 

{{esl}  where  (color  =  ’red')  cs  (id,class) 

{es3}  where  (num  >  7)  }  where  (size  <  5) 

5.2.0  BUFFER  COMMANDS 

These  interactive  commands  may  be  issued  by  the  user  via 
a  terminal  session  with  the  virtual  information  facility. 
They  are  the  means  by  which  an  interactive  environment  is 
constructed  in  which  the  data  base  commands  may  be  executed. 

The  buffer  is  divided  into  an  execution  and  a  transaction 

buffer,  an  adhoc  dictionary  is  built  for  the  duration 

of  each  transaction  in  which  many  data  base  statements 

may  be  strung  together  and  executed  sequentially.  Thus, 
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within  a  transaction  a  user  may  operate  on  either  the 
permanent  dictionary  shared  by  itself  and  any  other 
transaction  executed  before  or  after  it,  or  the  adhoc 
dictionary  which  is  for  its  own  exclusive  use. 

A  completed  data  base  statement  in  the  execution  buffer 
will  automatically  trigger  the  execution  of  that 
statement;  therefore,  the  execution  buffer  is  not  suited 
for  the  stringing  together  of  multiple  statements. 

Each  buffer  command  may  be  entered  from  within  either  the 
transaction  buffer  or  the  execution  buffer,  and  may  be 
recognized  by  two  or  more  initial  characters  of  its  full 
name,  furthermore,  the  contens  of  the  execution  buffer  and 
at  least  10  lines  of  the  transaction  buffer  will 
always  be  displayed  on  the  terminal. 

5.2.1  COMMAND  SYNTAX 

(1)  FINPUT  lstarg 

This  command  will  read  the  contents  of  the 
cms  file  whose  file  name  is  "file",  and  file  type 
is  whatever  is  entered  as  "lstarg",  into  the 
transaction  buffer.  The  original  content  of  the 
transaction  buffer  before  the  execution  of 
this  command  will  be  erased. 

(2)  FSAVE  lstarg 
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This  command  will  write  into  the  cms  file  whose  file 


name  is  "file",  and  file  type  is  whatever  is  entered  as 
"lstarg",  from  the  contents  of  the  transaction  buffer. 
Upon  completion  of  the  command,  transaction  buffer 
content  will  be  empty. 

( 3 )  TRANSACT 

This  command  lets  the  user  enter  the  transaction  buffer. 

( 4 )  ENDTRANS 

This  command  lets  the  user  terminate  the  transaction 
buffer  and  enter  the  execution  buffer. 

(5)  TERMINATE 

This  command  terminates  the  virtual  information 
facility  and  returns  control  to  cms. 

( 6 )  RUNTRANS 

This  command  executes  the  contents  of  the  transaction 
buffer  statement  by  statement. 

( 7 )  DODELETE 

This  command  does  the  same  as  "runtrans"  except 
that  upon  its  completion,  the  transaction  buffer 
contents  will  be  erased. 

(8)  ERASETRANS 

This  command  erases  the  contents  of  the  transaction 
buffer . 
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(9)  KILLEXEC 

This  command  erases  the  contents  of  the  execution  buffer. 

(10)  HELP 

This  command  gives  a  brief  description  of  all  buffer 
commands . 

(11)  INSERT  lstarg  2ndarg 

This  command  would  insert  a  line  of  text  into  the 
transaction  buffer,  the  first  argument  is  a 
destination  of  the  line  number  within  the  buffer 
after  which  the  inserted  line  is  to  be  inserted, 
and  the  second  argument  is  the  text  to  be  inserted. 

(12)  DELETE  lstarg 

This  command  deletes  a  line  from  the  current 
transaction  buffer,  and  the  exact  line  number  is 
specified  by  the  first  argument. 

(13)  TOPLINE  lstarg 

This  command  specifies  the  starting  line  number  of 
the  ten  transaction  buffer  lines  which  are  always 
displayed,  and  that  number  is  designated  by  the 
first  argument. 

5.3.0  FORMAL  BNF  DESCRIPTION  OF  DATA  BASE  STATEMENTS 

**************************************************************** 
<def stmt> : : =  "DEFINE"  |  "DEF"  name  defopt  ; 
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<def opt> 


<adhocstmt> 

<liststmt> 

************* 

<retrstmt> 
<vsets>  : : = 

<vsetsl> 

<combsets> 

<vsets2>  : : = 

<vindrs>  : := 


<setop>  :  :  = 
<ncs>  : : = 

<ref list> : : = 

<CS>  : : = 


<  "AS"  <***>  >  j 

<  "REMOVE"  |  "REM"  > 

::=  "ADHOC"  name  defopt  ; 

:  :=  "LISTDEF"  name  ; 

******************************************** 
::=  "RETRIEVE"  (  vsets  )  byspec  ; 

<  "{"  vsetsl  "}"  {  "where"  (  cond  )  }  >  | 

<  vindrs  > 

{  ,  vsets  > 

name  | 
combsets 

: :=  <  "{"  vsetsl  {  "where"  (  cond  )  }  > 

<  "{"  vindrs  "}"  > 

{  setop  vsets2  } 

<  vsetsl  {  "where"  (  cond  )  }  >  | 


<  vindrs 

> 

<  "VO" 

|  "VI"  1 

"V2 "  | 

"V3 "  |  "V4"  | 

|  "V6"  1 

"V7 "  | 

A 

cn 

> 

co 

> 

1  <  "{" 

vindrs  " 

}"  > 

<cs>  | 

<ncs> 

<  "MI" 

|  "SI"  | 

"MU"  | 

"SU"  > 

(  reflist  ) 

ref  1  { 

,  ref 1  > 

"CS"  ( 

varef  , 

varef 

) 

<exp>  : :=  <(exp)>  |  exp-infl  |  exp-inf2  |  exp-pre 

|  exp-pwr  |  exp-prim 

<exp-infl>  : :=  exp  infl-op  <(exp)>  |  exp-inf2  |  exp-prim 

|  exp-pwr 

<exp-inf2>  ::=  exp2  inf2-op  <(exp)>  |  exp-prim  |  exp=pv/r 
<exp2>  : :=  <(exp2)>  |  exp-inf2  j  exp-pre 

|  exp-pwr  !  exp-prim 
<infl-op>::=  +  |  -  |  "|" 

<in£2-op>::=  *  |  / 

<exp-pre>::=  pre-op  <  exp-prim  |  exp-pre  |  exp-pwr  > 

<pre-op>  : :=  +  |  - 

<exp-prim>  ::=  ref  |  const 

<exp-pwr>: :=  exp-prim  !  <  exp-prim  |  exp-pre  > 

<ref>  ::=  refl  |  funref 
<refl>  {  }  varef 

<varef>  : :=  name  {  (  varef  )  } 

<const>  ::=  fixed  |  integer 
<fixed>  : :=  integer  .  {  integer  > 

<digit>  ::=  0|1|2|3|4|5|6|7|8|9 

<integer>::=  digit  |  digit  integer 
***************************************************************** 

<funref>  ::=  singfun  |  strfun 
<singfun>::=  funame  (  exp  ) 

<funame>  : :=  "MAX"  |  "MIN"  j  "ABS"  |  "POS" 

|  "SGN"  |  "ZER"  |  "SUM" 

<strfun>  ::=  "str"  (  strarg  ) 
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<strarg>  : :=  exp  ,  exp  {  :  exp  }  strargl 

<strargl>::=  {  <  @  exp  {  :  exp  }  >  |  <  ,  exp  {  :  exp  } 

<cond>  : :=  <  (  cond  )  >  |  condl 

<condl>  ::=  smpcnd  |  <  cond  condop  cond2  > 

<cond2>  ::=  smpcnd  |  <  (  cond  )  > 

<smpcnd>  :  :  =  <  (  smpcnd  )  >  |  <  {  not  }  (  smpcnd  )  > 

|  <  exp  re lop  exp  > 

<not>  :  :  =  -»  |  <>  not 

<condop>  ::=  "AND"  |  "OR"  |  "XOR" 

<relop>  : : =  =  |  >  |  < 

|  {  not  }  = 

|  {  not  }  > 

I  {  not  }  < 

name  any  character  string  of  length  less  than  17 

and  composed  only  of  the  following  characters: 
abcde  f  ghi j  klmnopqr stuvwxy 2 $  ? " 

i**************************************************************** 

5.3.1  BNF  SUPPLEMENT 

*  comments  are  enclosed  within  "\"  characters 

*  quote  characters  within  string  constants  are  represented 
by  two  consecutive  single  quote  characters 

*  string  constants  are  enclosed  by  "’" 


,  single  quote 


characters 


*  <***>  designates  any  arbitrary  character  string 

*  single  line  comments  enclosed  by  "\"  characters  are  permitted 
before,  after,  and  within  each  data  base  statement, 

as  well  as  before  and  after  each  buffer  command  line. 

They  are  eventually  removed,  and  are  not  recognized 
as  part  of  any  input  line. 

*  the  system  makes  no  distinction  between  lower  and  upper 
case  characters. 
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6.0.0  FINITE-STATE-MACHINE  ( PUSH-DOWN-AUTOMATON ) 


In  this  chapter,  the  Finite-State-Machine  used  to  parse  data 
base  retrieval  statements  would  be  briefly  described.  A 
Finite-State-Machine  consists  of  a  number  of  states,  one  of 
which  is  a  start  state,  and  one  or  many  of  which  may  be  an  end¬ 
ing  state.  There  also  is  a  pointer  which  would  point  to  the  cur¬ 
rent  word  being  examined  on  the  user  given  statement  being 
processed.  In  our  case,  the  user  input  is  always  a  retrieval 
statement  written  in  the  data  base  language  presented  in 
Chapter5,  and  each  word  would  always  be  a  single  token  along 
the  token  chain  to  which  the  original  retrieval  statement  has 
already  been  transformed  to. 

Each  state  within  the  machine  has  a  set  of  match-next_state 
rules,  and  collectively  the  union  of  these  sets  of  rules  regu¬ 
late  the  behavior  of  the  finite-state  machine  on  any  given 
input.  Each  state  attempts  to  find  a  match  between  the  current 
word  and  the  match  section  of  any  one  of  its  rules,  and  if  a 
match  is  found,  then  control  is  passed  to  the  state  identified 
by  the  next_state  section  of  the  matching  rule,  and  the  input 
pointer  points  to  the  next  word  of  the  statement  being  proc¬ 
essed.  If  the  end  of  input  is  ever  reached,  and  control  happens 
to  be  within  an  ending  state,  then  the  machine  halts  and  is  said 
to  have  accepted  the  statement  which  it  had  just  processed. 


Our  construction  of  such  a  finite-state-machine  went 


straight  on  to  meet  a  number  of  problems.  First,  such  a  machine 
has  no  provisions  for  any  processing  except  for  moving  from 
state  to  state.  Thus,  we  augmented  our  design  to  a 
Push-Down-Automaton  which  is  a  finite- state-machine  with  aux¬ 
iliary  memory  and  data  movement  capabilities.  In  fact,  in  order 
to  generate  an  execution  tree  and  various  tables  from  a  given 
retrieval  statement,  we  had  to  augment  the  processing  ability 
of  the  automaton  by  a  set  of  action  routines,  and  transform  the 
format  of  state-rules  to  a  tri-tuple  consisting  of  a  match  sec¬ 
tion,  an  action  section,  and  a  next_state  section. 

The  auxiliary  memory  we  have  chosen  for  the  automaton  is  in 
the  form  of  two  stacks,  an  operator  stack  and  an  operand  stack. 
The  ratch  section  of  each  rule  has  provisions  to  match  either 
the  current  word  on  three  different  sources,  the  input  token 
chain,  the  top  of  stack  #1,  and  the  top  of  stack  #2.  Actually, 
when  the  source  is  the  input  token  chain,  an  added  ability  to 
match  for  a  given  class  of  tokens  as  well  as  a  specific  token  is 
available.  The  action  section  of  each  rule  has  provisions  for 
pushing  and  poping  the  current  input  token,  onto  or  off  either 
stack  #1  or  stack  #2,  and  for  invoking  other  action  routines 
which  generates  an  execution  tree  and  various  tables  from  the 
current  elements  on  both  stacks,  and  modifies  the  current  con¬ 
tents  of  the  stacks.  The  next_state  section  of  each  rule 
contains  the  state  ID  number  which  identifies  the  next  slate  to 
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which  control  would  be  passed  to  after  the  proper  action  rou¬ 
tines  in  the  matching  rule  have  been  executed.  Our  machine  is 
deterministic  in  nature;  by  this  we  mean  that  for  a  given  com¬ 
bination  of  input  token,  top  of  stack  #1,  and  stack  #2,  there  is 
at  most  one  next_state  from  which  control  may  go  to  after  the 
current  state.  A  non-determini  Stic  machine  would  have  been 
condensed,  but  also  more  complex  ^and  harder  to  implement 
because  of  the  need  to  backtrack  over  decision  points. 

The  construction  of  this  push-down-automaton  is  logically 
divided  into  two  parts,  the  writing  of  the 
match-action-next_state  rules,  and  the  writing  of  a  program 
which  takes  these  rules  as  an  input  and  sets  up  the  proper  envi¬ 
ronment  in  which  data  would  be  matched,  actions  would  be  per¬ 
formed,  and  next_states  would  be  go  to,  exactly  according  to 
the  specifications  of  the  prescribed  match-action-next_state 
rules.  The  front-end  of  the  virtual  information  facility,  as 
presented  in  this  thesis,  is  responsible  for  the  writing  of 
these  match- action-next- state  rules,  and  the  implementation  of 
these  rules  is  a  responsibility  of  the  back-end. 

6.1.0  CONFIGURATION 

Action  routine  implementations  are  part  of  the  back-end 
written  by  Peter  Lu  in  his  concurrent  thesis.  These  programs 
are  within  the  PARSE  module  as  illustrated  in  figure  4.1  .  As 


part  of  the  front-end,  the  match-action-next_state  rules  are 
within  the  FINITE-STATE-MACHINE  module  and  currently  reside  in 
CMS  file  "file  machin"  .  A  DEFMCH  module  is  written  to  estab¬ 
lish  the  machine  environment,  taking  the  contents  of  "file 
machin"  as  input,  and  the  PARSE  module,  when  called  upon,  acti¬ 
vates  the  finite-state-machine. 

6.2.0  MATCH- ACTION-NEXT_STATE  RULES 

Match-Action-Next_State  rules  is  a  3-tuple  of  information. 
The  first  component  presrcibes  what  to  match  for  a  certain  ele¬ 
ment  on  either  one  or  more  than  one  of  the  following  sourses, 
the  input  stream,  top  element  of  stack  #1,  and  top  element  of 
stack  #2.  The  second  component  is  a  sequence  of  action  routine 
invocations;  these  routines  are  to  be  executed  whenever  the 
matching  component  of  the  same  rule  matches.  The  third  compo¬ 
nent  is  a  state-number  representing  the  next  state  to  which 
control  is  to  go  when  the  action  routines  in  the  same  rule  have 
been  executed. 

The  three  components  of  each  rule  is  separated  from  each  oth¬ 
er  by  the  "\"  character  as  illustrated  in  the  following: 

\  match  component  \  action  component  \  next_state  number  \ 
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Furthermore,  since  these  rules  prescribe  the  transitions 
from  state  to  state,  they  are  referred  to  as  the  "transiion 
rules",  and  the  full  specification  of  a  transition  rule  is  as 
follows : 

t  \  match  component  \  action  component  \  next_state  number  \ 

The  specification  of  a  state  is  accomplished  first  by  writing 
the  following  to  indicate  the  identity  of  the  state: 

s  \  state  number  \ 

and  then  by  a  listing  of  the  match- acticn-next_state  rules 
which  belong  to  this  particular  state.  The  sequential  order  of 
rules  in  this  list  can  not  be  inter-changed,  because  when  con¬ 
trol  comes  to  each  state,  the  rules  will  be  trid  sequentially 
in  the  order  of  their  position  on  the  list.  Thus,  a  sample  state 
specification  is  as  the  following: 

s  \  25  \ 

t  \  retrieve  \  del  \  26  \ 
t  \  (  \  pop , 1  \  27  \ 
t  \  +  \  pop, 2  \  36  \ 

The  foregoing  rules  would  first  try  to  match  the  word  from 
the  input,  if  it  is  matched,  then  the  action  routine  "del". 
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delete  input  token,  would  be  executed,  and  then  control  would 
go  to  state  number  26.  If  the  first  rule  did  not  match,  then  the 
second  rule  which  attempts  to  match  a  "("  character  on  the 
input  would  be  tried.  If  this  rule  matches,  then  the  "pop"  rou¬ 
tine  would  be  executed,  and  stack  #1  would  be  popped,  and  con¬ 
trol  would  go  to  state  number  27.  If  the  second  rule  did  not 
match,  then  the  machine  would  try  the  third  rule,  which  matches 
for  the  "  +  "  operator  on  the  input  stream,  if  it  is  matched,  then 
stack  #2  would  be  popped  and  control  would  go  to  state  number 
36.  If  none  of  the  rules  for  a  the  current  state  matches,  then 
the  machine  would  signal  premature  termination,  which  means 
that  the  input  is  invalid,  and  diagnostic  messages  would  be 
sent  to  CMS  file  "file  error".  State  number  0  is  the  final  state 
of  the  machine,  and  if  control  is  ever  passed  to  this  state, 
then  the  input  is  valid,  accepted,  and  successfully  processed; 
the  associated  execution  tree  and  entity  set  tables  would  have 
already  been  generated  and  available  for  use  by  the  following 
stages  of  the  virtual  information  facility. 

The  match  component  of  each  rule  is  composed  of  zero  to  three 
separate  parts,  each  of  which  is  separated  from  the  other  by  a 
","  character.  The  first  part  represents  the  input  source,  the 
second  part  represents  the  source  from  stack  #1,  and  the  third 
part  represents  the  source  from  stack  #2.  If  non  of  the  three 
parts  exist,  then  that  particular  rule  would  match  everything 
and  anything,  and  the  corresponding  action  routines  would 
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always  be  executed  if  the  machine  ever  tries  to  match  that 
rule . 

For  instance: 

s\l\ 

t\a, b, c\del\2\ 

The  foregoing  rule  would  match  simultaneously  a  character 
"a"  on  the  input  stream,  a  character  "b"  from  the  top  of  stack 
#1,  and  a  "c"  character  from  the  top  of  stack  #2.  All  three 
sources  must  be  matched  before  the  corresponding  action  rou¬ 
tines  may  be  executed.  If  any  of  the  sources  does  not  match, 
then  this  entire  rule  is  not  matched,  and  either  the  next  rule 
in  the  sequence  would  be  matched  or  the  machine  would  signal 
premature  termination  if  there  are  no  more  rules  to  be  matched 
for  this  state.  Thus,  at  most  three  sources  may  be  matched  in 
the  match  component  of  the  rule,  and  at  most  one  single  token 
may  be  matched  on  any  one  source. 

The  action  component  of  each  rule  may  contain  calls  to  more 
than  one  action  routine.  These  action  routine  invocations  are 
written  in  sequential  order  and  are  separated  by  the  "|"  char¬ 
acter;  these  routines  would  be  executed  in  the  order  of  there 
appearence  in  the  action  component.  For  instance: 


/ 


t\  +  \  push,2,i@  |  del  |  pop,  1  \  6  \ 


The  foregoing  rule  would  match  the  "  +  "  character  on  the  input 
stream,  and  then  execute  the  three  action  routines  in  sequen¬ 
tial  order.  First,  it  would  push  the  "+"  character  on  to  stack 
#2,  as  specified  by  the  first  routine  call,  then  it  would 
delete  the  current  character  on  the  input  stream,  thereby 
advancing  the  input  pointer  to  point  to  the  next  input  token, 
and  then  it  would  pop  stack  #1,  popping  off  stack  #l's  top  ele¬ 
ment. 


In  order  to  facilitate  the  matching  of  a  group  of  symbols, 
not  necessarily  all  of  the  same  classification,  we  have  devel¬ 
oped  the  concept  of  a  "cluster";  a  cluster  simply  is  a  union  of 
one  or  more  prescribed  symbols  which  may  be  matched  under  one 
cluster  name.  All  cluster  names  begin  with  a  character,  and 
they  provide  an  added  convenience  for  the  making  of  transition 
rules.  For  instance,  an  arithmatic  cluster  may  include  the  + 
and  -  characters,  and  be  named  "@sumop" .  By  using  the  name 
@sumop  in  the  match  component,  we  may  match  either  the  +  or  the 
-  characters.  We  currently  have  the  following  groups  of  clus¬ 
ters: 


@sumop 

@sumop> 

@multop 


+  -  *  /  » 
/  /  /  /  / 


@multop>  - 

*  ,  /  ,  ! 

@conc  - 

1 

@conc>  - 

1  ,  +  . 

; 

@virt  - 

v0  ,  vl  ,  v2  ,  v3  , 

v4 

v5  ,  v6  ,  v7  ,  v8  , 

v9 

@rel  - 

>  ,  < 

@cmp  - 

and  ,  or  ,  xor 

@setop  - 

cs  ,  mu  ,  mi  ,  su  , 

,  si 

It  was  mentioned  earlier  that  a  rule  may  match  from  the  input 
source  a  token  of  a  specific  classification;  to  do  this,  the 
rule  indicates  the  class  of  characters  it  would  be  matching  for 
by  writing  a  character  and  followed  by  the  class  represen¬ 
tation  character.  There  are  altogether  eight  different  classes 
of  tokens,  namely  the  classes  A,N,D,0,B,Q,M,  and  S  . 

Characters  belonging  to  class  A  are  the  following: 

abc  de  f  ghi j k lmnopqr  s tuvwxyz 

Characters  belonging  to  class  N  are  the  following: 

0123456789 

Characters  belonging  to  class  D  are  the  following: 


Characters  belonging  to  class  0  are  the  following: 


< 


+-/*'■  I 

Characters  belonging  to  class  B  are  the  following: 

-=><  , o°n 

Characters  belonging  to  class  Q  are  the  following: 


Characters  belonging  to  class  M  are  the  following: 

Characters  belonging  to  class  S  are  the  following: 

% 


The  following  rule  matches  for  a  token  of  class  M  on  the  input 
stream,  does  nothing,  and  then  passes  control  to  state  4. 

s  \  3  \ 

t  \  :  M  \  \  4  \ 

The  parsing  of  any  language  may  sometimes  be  facilitated  by 
the  creation  of  sub-pasers  which  parse  a  subset  of  the 
language.  The  usefulness  of  this  idea  is  demonstrated  by  our 
using  of  the  sub-routine  concept  in  the  finite-state-machine. 
Two  sub-machines  were  written,  one  to  parse  expressions,  and 
one  to  parse  entity  set  conditions  which  calls  on  the 
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expression  parser.  The  idea  is  to  pass  control  to  the 
sub-parser,  and  leave  subroutine  return  command  and  address  on 
top  of  stack  #2  before  entering  the  sub-parser.  When  the 
sub-parser  finds  a  negative  state  number  as  the  next_state,  it 
should  have  that  return  command  and  address  available  on  top  of 
stack  #2,  and  should  pop  that  element  off  stack  #2,  and  then 
pass  control  to  the  state  identified  by  that  command.  For  exam¬ 
ple,  the  following  rule  passes  control  to  a  sub-parser  which 
starts  on  state  30,  and  also  specifies  the  return  state  number 
as  80  when  the  sub-parser  finds  a  negative- state  number  in  the 
next_state  component.  With  this  strategy,  the  last  rule  which 
indicate  the  successful  parse  of  the  sub-parser  must  have  a 
negative  state  number  in  the  next_state  component. 


s  \  20  \ 

t  \  d \  del  |  push, 2 , subr : 80  \  30  \ 

6.3.0  ACTION  ROUTINES 

The  following  action  routines  are  written  within  the 
back-end  of  the  facility  and  are  available  for  use  in  the 
action  component  of  the  match-action-next_state  rules: 


Routine 


Usage  Semantics 
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POP 


Pop,  1 

pop  ,2 


pops  stack  #1 
pops  stack  #2 


PUSH  or  P  push,l,i@  pushes  the  input 

p,l,i@  token  onto  stack  #1 

push,l,c@  pushes  the  input  token 

concatenated  by  a  " : " 
and  its  classification 
onto  stack  #1 

push, 2, i@: 2  pushes  the  input  token 
concatenated  by  ":2" 
onto  stack  #2 . 

Frequently,  this  is  used 
to  associate  the  expected 
number  of  operands  to  the 
operator  which  is  being 
pushed  onto  the  operator 
stack. 

DEL  del  advances  the  input  pointer 

to  point  to  the  next  input 
token. 
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Non-stack  related  routines: 

GENNODE  or  GD 

generates  an  operator  node  with  its  specified 
number  of  children  which  are  found  on  stack  #1 
as  a  partial  execution  tree.  The  address  to 
the  partial  tree  just  generated  is  placed 
back  on  top  of  stack  #1. 


INDX 

adds  the  first  level  of  indirection  to  a 
data  element 

ADDON 

adds  an  additional  level  of  indirection 

VIRTX 

exchanges  the  addresss  of  the  indicated 
virtual  entity  set 

MU  LX 


generates  a  multiple  "OR"  node 


ATTWHR 


attaches  the  address  of  a  condition  to 
its  associated  virtual  entity  set 

GENENT 

generates  a  new  virtual  entity  set 


VIRTA 

adds  the  cuurent  virtual  entity  set  to 
an  entity  set  table 


The  non-stack  relate  routines  listed  above  have  much  to  do 
with  the  internal  workings  of  the  back-end  and  are  left  to  be 
more  precisely  explained  by  the  back-end  documentation. 

6.4.0  LISTING 

The  rules  written  in  CMS  file  "file  machin"  are  not  readily 
readable  because  of  its  syntax;  a  FORMMCH  program  is  available 
to  format  it  to  a  readable  form  as  shown  in  a  listing  of  the 
rules  on  the  following  pages: 
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STATE  NUMBER:  40 

40.  1  ),.(  0ELjP0P,2jGENN0DE 

40.  2  %0  DEL  MULX.1 


FINITE-STATE-MACHINE  RULES 
MATCH  -  ACTION  -  NEXT  STATE 


FINITE-STATE-MACHINE  RULES 
MATCH  -  ACTION  -  NEXT_STATE 


STATE  NUMBER:  83 

83.  1  )  PUSH. 2, I*  DEL 

83.  2  %0  PUSH.2.I*jDEL 

83  3  (  PUSH.2.INO(!DEl!INOX 


STATE  NUMBER:  103 

103 .1  . , (  POP , 2 

103.  2  PUSH , 2 . SUBR : 52 


FJNITE-STATE -MACHINE  RULES 
MATCH  -  ACTION  -  NEXT_STATE 


7.0.0  MAJOR  DESIGN  ISSUES 

The  following  are  some  of  the  decisions  which  we  had  to  make 
in  the  design  of  the  virtual  information  facility. 

7.1.0  FORM  OF  STORAGE  FOR  VIRTUAL  DEFINITIONS 

How  can  virtual  definitions  be  stored?  We  had  two  viable 
alternatives.  One  way  is  to  store  the  definitions  just  as  they 
are,  in  the  form  of  character  strings,  and  when  in  use,  the 
definition  would  be  substituted  within  the  actual  data  base 
retrieval  statement  in  place  of  the  virtual  definition  name. 
The  alternative  to  this  strategy  is  to  parse  the  definitions 
ahead  of  time,  generate  the  associated  execution  tree  and  enti¬ 
ty  set  tables,  and  when  in  actual  use,  the  partial  execution 
tree  would  be  simply  attached  to  the  main  execution  tree  as  an 
extended  subtree,  and  the  main  entity  set  tables  would  simply 
be  augmented  to  include  the  partial  table  built  from  the  defi¬ 
nitions. 

The  method  of  parsing  the  definitions  first  is  similar  to  the 
process  of  compilation.  When  in  actual  use,  previously  defined 
definitions  need  not  be  processed  again  and  again.  The  method 
of  storing  the  definitions  as  they  are  is  similar  to  the  proc¬ 
ess  of  interpretation.  Each  time  a  definition  is  used,  the 
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entire  process  of  parsing,  tree  building  and  table  generation 
would  have  to  be  repeated. 

Storing  definitions  as  they  are  enhances  the  flexibility  of 
virtual  information.  Definitions  may  be  created,  modified,  and 
even  deleted  with  great  ease  and  efficiency.  Furthermore,  it 
eliminates  the  need  to  rebuild  itself  when  users  request  a 
listing  of  the  stored  definitions.  It  also  would  enable  a  gen¬ 
eralized  macro  facility  in  which  not  only  legitimate  and 
coherent  definitions  may  be  stored,  but  also  the  seemingly 
illogical  and  incoherent  definitions  as  well. 

Parsing  the  definitions  as  soon  as  they  are  defined  is  not  an 
easy  task.  Many  times,  without  the  proper  context  in  which  the 
definitions  are  to  be  used,  the  associated  semantics  are  not 
always  clear.  Even  if  we  can  get  around  this  problem  by 
restricting  the  potential  contexts  in  which  each  definition 
may  be  used,  we  still  would  be  encounter  complicated  problems 
in  frequent  tree  manipulation.  Re-shaping  an  execution  tree  is 
a  very  "messy"  task,  and  would  be  prone  to  erroneous  branch 
connections;  traversing  a  huge  tree  is  also  not  a  reasonably 
efficient  operation. 

Thus,  mainly  for  the  foregoing  reasons,  we  have  decided  on 
the  first  strategy,  storing  them  as  they  are  until  invocation, 
to  store  virtual  definitions. 
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7.2.0  PARSER  STRUCTURE 


Two  strategies  were  given  serious  consideration  for  the 
parsing  of  data  base  statements  written  in  the  language  speci¬ 
fied  in  Chapter5.  One  method  is  to  construct  a 
FINITE-STATE-MACHINE  which  includes  a  set  of 
match-action-next_state  rules  that  correspond  to  the  grammar 
rules  of  the  data  base  language.  In  this  manner,  these 
match-action-next_state  rules  are  inputs  to  the  actual  parser 
just  like  the  data  base  statements  which  are  to  be  parsed;  with 
this  approach,  changes  in  grammar  rules  readily  correspond  to 
changes  in  the  machine  rules.  The  other  method  is  to  construct 
a  conventional  parser  in  which  grammar  rules  are  part  of  the 
parser  program  itself. 

The  finite-state-machine  strategy  has  a  highly  modular  char¬ 
acteristic  and  gives  added  flexibility  to  the  data  base  lan¬ 
guage  in  terms  of  modifiability;  however,  it  would  be  the  first 
of  such  machines  ever  written  by  the  author.  The  decision  was 
made  in  favor  of  the  finite-state-machine  because  the  definite 
gains  of  this  approach  seem  to  surpass  the  potential  for  fail¬ 
ure  of  its  implementation. 

7.3.0  PROGRAM  CONTROL  STRUCTURE 


A  decision  was  made  to  implement  a  centralized  and  horizontal 
control  structure  for  the  passing  of  program  control  from  one 
to  another.  The  alternative  is  to  build  a  vertical  control 
structure  in  which  modules  are  nested  one  within  another,  and 
control  may  propagate  many  levels  deep  before  suddenly  jumping 
out  to  the  top.  The  centralized  control  structure  features  an 
activity  coordinator  to  which  control  must  return  to  from  each 
module  before  it  is  passed  to  another.  Although  the  vertical 
approach  may  seem  more  natural,  the  horizontal  approach  is  more 
adapted  to  the  idea  of  a  single  virtual  information  level  with¬ 
in  the  hierarchical  design  of  INFOPLEX.  Furthermore,  the 
horizontal  approach  contributes  more  to  program  modularity 
with  its  regard  for  each  module  as  a  separate  and  un-nested 
entity.  For  these  reasons,  the  decision  was  made  to  build  a 
centralized  and  horizontal  control  structure. 

7.4.0  INTERACTIVE  EDITOR 

Consideration  was  given  to  the  question  of  whether  or  not  to 
build  an  interactive,  full-screen  editor  in  real-time,  similar 
to  a  miniature  EMACS  or  XEDIT  editor  as  part  of  the 
user-interface  developed  for  the  virtual  information  facility. 
The  seriousness  of  the  consideration  remained  questionable  to 
this  day.  The  argument  against  it  is  that  the  buffer  program 
already  supports  the  capability  of  inputing  the  transaction 
buffer  content  from  an  arbitrary  CMS  file;  a  user  of  virtual 


information  may  readily  use  the  XEDIT  editor  available  on  CMS 
to  edit  their  transaction  stored  in  a  CMS  file,  and  later  input 
that  transaction  to  the  transaction  buffer  through  the  FINPUT 
buffer  command.  Our  interactive,  full-screen  line  editor  would 
take  some  effort  to  develop,  and  still  would  not  be  nearly  as 
powerful  as  XEDIT.  In  other  words,  resources  may  be  better  uti¬ 
lized  if  spent  on  other  areas  of  the  virtual  information 
facility.  The  argument  for  such  an  editor  is  simply  that  it 
would  provide  the  added  flexibility  to  change  modify  buffer 
contents  from  within  the  virtual  information  facility. 

Finally,  a  decision  was  made  to  build  a  primitive  line  editor 
with  display  capabilities.  This  is  a  compromise  between  the  two 
extremes;  not  much  resources  in  terms  of  man-hours  would  be 
spent  building  such  an  editor,  and  it  would  give  users  ox  vir¬ 
tual  information  an  added  flexibility  and  convenience  in  being 
able  to  edit  their  transaction  buffer  content  from  within  the 
facility. 

7.5.0  LANGUAGE  DESIGN 


A  decision  was  made  to  support  infix  arithmatic  and  string 
operators  instead  of  operators  in  prefix  or  lisp  notation. 
Although  infix  operators  give  rise  to  a  language  more  difficult 
to  parse,  they  are  more  user-friendly.  Also,  a  decision  was 
made  to  support  the  capability  of  explicitly  over-riding  the 
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natural  operator  precedences  by  the  use  of  parentheses  in 
arithmatic,  string,  as  well  as  boolean  expressions .  This  capa¬ 
bility  makes  more  difficult  the  parsing  process,  but  gives  much 
added  power  and  flexibility  to  the  language.  In  essence,  the 
added  advantages  of  infix  operators  and  use  of  parentheses  for 
specification  of  precedence  are  considered  well  worth  their 
cost  of  implementation. 
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8.0.0  CONCLUSION 


Progress  was  made  steadily  and  swiftly  all  through  the  first 
two  months  of  design  and  implementation.  Then,  as  precious  time 
passed  by  each  day,  increasing  hours  of  work  were  required  for 
the  prompt  completion  of  all  thesis  objectives.  Eventually, 
all  available  time  was  devoted  to  thesis  work  and  efforts  on 
academic  courses  became  nearly  non-existent. 

Finally,  the  complete  design  and  an  initial  version  of  the 
implementation  were  completed  and  integrated  with  Peter  Lu's 
back-end  to  set  up  the  first  virtual  information  facility  in 
operation  on  the  INFOPLEX  software  test  vehicle,  a  software 
simulation  of  the  INFOPLEX  data  base  computer.  An  extensive  set 
of  improvised  test  cases  were  written  and  tested  on  the  facili¬ 
ty.  The  internal  interface  to  the  back-end,  namely,  the 
instructions  issued  within  the  finite-state  rules  does  to  con¬ 
form  to  expectations.  Although  the  back-end  is  not  yet  able  to 
integrate  with  the  lower  level  of  INFOPLEX  to  access  real  data, 
it  is  able  to  generate  correct  information  requests  based  on 
the  execution  tree  and  entity  set  table  established  through 
finite-state-machine  instructions  which  are  issued  within  the 
front-end . 

The  results  of  the  implementation  give  support  to  the  design 
decisions  which  were  made,  especially  the  decision  to  con- 


struct  a  finite-state  machine  and  to  keep  virtual  definitions 
as  they  are.  Most  thesis  objectives  were  achieved  except  for 
the  need  of  more  rigorous  test  cases  to  establish  the  integrity 
of  the  facility.  Instantly,  this  facility,  when  eventually 
integrated  to  the  next  level  of  INFOPLEX  hierarchy,  would 
greatly  extend  the  power  and  capability  of  the  data  base. 
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finite-state-machine  rules 
MATCH  -  ACTION  -  NEXT_STATE 


FINITE 'STATE -MACHINE  RULES 
MATCH  -  ACTION  -  NEXT_STATE 


STATE  NUMBER:  84  PUSH, 1 , IP [DEL 

84 .  1 


FINITE -STATE -MACHINE  RULES 
MATCH  -  ACTION  -  NEXT  STATE 


STATE  NUMBER:  102 

102.  1  ..(  POP, 2 

102.  2  PUSH , 2 , SUBR : 52 


FINITE -STATE -MACHINE  RULES 
MATCH  -  ACTION  -  NEXT_STATE 


STATE  NUMBER:  127 

127.  1  POP. 2 

127.  2  PUSH. 2 . SUBR : 66 


FINITE -STATE -MACHINE  RULES 
MATCH  -  ACTION  -  NEXT  STATE 


STATE  NUMBER :  147 

MACHINE  DEFINITION  COMPLETE 
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